Okay, so generally we have two problems in learning. One is called overfitting, which
is really that we have, that we're not describing the function we really want to describe. We're
not describing F, but only the examples we have for S. And if they're noisy, for instance,
going back to your question, then we're actually describing the noise with it, which might
be extremely complicated. So that's what we call overfitting. And underfitting is when
we can't capture the intended process that is hidden or relationships that are hidden
in the data. That's always going to bug us. And those are real world problems. In 1992,
I was at Carnegie Mellon University where they had one of the first self-driving cars,
and it was extremely impressive. It really could drive on small streets and with a neural
net after I think 500 yards or so of driving, it could actually follow the road wonderfully.
Nice little neural net room there. And they just had one problem. They wanted to drive
on the highway, but that had really two problems, which were overfitting problems. At some point,
the neural net realized that there are these very nice white lines at the edge of the road,
which meant they could only drive on the left lane of the highway, because otherwise this
car would have, which was actually an army truck, but would actually take every exit.
It forgot everything else that it had learned that there are trees and other cars and just
started following that white line. Slight problem there. Or when at some point in the
US they sometimes do, on the highway actually these white lines or even the, what's it called,
light planking. I forget. Once they stop, and sometimes they do, the car would actually,
since it had learned that these guardrails are actually the best thing to follow, it
would kind of lose all of the information it had overfitted to and just basically cry
out in panic and stop because it couldn't navigate anymore. Those are typical overfitting
things. Rather than learning how to drive, this neural network was learning how to follow
either white lines or guardrails. But it learned that totally autonomously on its own and became
better and better at following the guardrails. So what these people actually did was the
video signal that this was following, they just basically blurred the right and the left
field. This is the same thing they do with horses, right? They have these kind of shutters
so that they don't get spooked by cars overtaking them. So essentially that's what they did.
And improved the, basically made the data more noisy so that the car could concentrate
on the right thing and wouldn't overfit. Okay. So overfitting is something that's an inherent
problem whereas underfitting is usually being, you can cure by more data.
So what can we do? What can we do say in decision tree learning? What you usually want is you
want to have decision trees and then you want to go and generalize them so that they actually
become less overfitted. If you have a tree that's very deep and elaborate and so on,
it may actually make decisions that are only fitted to the particular subset of data. Remember
we have a set, we have a process or some kind of a mechanism we want to make predictions
on and we have a limited sample, the ones, the examples we've seen so far. But there's
always the future which we actually want to predict. And so it might actually be a good
idea to not overfit. And one idea we can do is when we have these decision trees, we might
actually go over them again and then generalize them so that they actually become better,
less overfitted. And the obvious idea in decision trees where
small trees are beautiful is you go through the tree again, you look at the nodes and
under some, in some situations you have the feeling I don't need to make that decision.
That's a useless decision. Throw out the terminal node, throw and kind of move up the information
we have there. And that's called decision tree pruning. So we do it on the terminal
test nodes, the nodes in our tree that only have decision leaves under it. And then we
test whether such a node is irrelevant, which means in our system it has a very low information
gain. The information gain is low. We're going to make decisions that have a very, very small
empirical basis, if you will. And then you just replace that by a leaf node. You just
count the examples and just take the node mode again.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:26:53 Min
Aufnahmedatum
2021-03-30
Hochgeladen am
2021-03-30 16:26:30
Sprache
en-US
Explanation of generalization and overfitting. Additionally, it is explained how decision trees can be cut.